A Text Classifier Based on Sentence Category VSM

نویسندگان

  • Yunliang Zhang
  • Quan Zhang
چکیده

VSM is a mature model of text representation for categorization. Words are commonly used as dimensions of feature space of VSM, but words only provide little semantic information. Sentence category theory is an important component of HNC theory and can provide abundant information about meaning, structure and style of a sentence. We use sentence categories as dimensions of feature space, reduce the dimensionality by dividing mixed sentence categories and reform the weights by tfc-weighting algorithm. By simple vector distance calculation, we can get the parameters of the classifier and execute the categorization. The average precision and recall of our classifier are acceptable and can be improved by other HNC techniques.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improvement of Chemical Named Entity Recognition through Sentence-based Random Under-sampling and Classifier Combination

Chemical Named Entity Recognition (NER) is the basic step for consequent information extraction tasks such as named entity resolution, drug-drug interaction discovery, extraction of the names of the molecules and their properties. Improvement in the performance of such systems may affects the quality of the subsequent tasks. Chemical text from which data for named entity recognition is extracte...

متن کامل

Approach for Text Classification Based on the Similarity Measurement between Normal Cloud Models

The similarity between objects is the core research area of data mining. In order to reduce the interference of the uncertainty of nature language, a similarity measurement between normal cloud models is adopted to text classification research. On this basis, a novel text classifier based on cloud concept jumping up (CCJU-TC) is proposed. It can efficiently accomplish conversion between qualita...

متن کامل

A Customizable Text Classifier for Text Mining

Text mining deals with complex and unstructured texts. Usually a particular collection of texts that is specified to one or more domains is necessary. We have developed a customizable text classifier for users to mine the collection automatically. It derives from the sentence category of the HNC theory and corresponding techniques. It can start with a few texts, and it can adjust automatically ...

متن کامل

A Novel Approach for Ontology-Based Feature Vector Generation for Web Text Document Classification

Thetaskofextractingtheusedfeaturevectorinminingtasks(classification,clustering...etc.)is consideredthemostimportanttaskforenhancingthetextprocessingcapabilities.Thispaperproposes anovelapproachtobeusedinbuildingthefeaturevectorusedinwebtextdocumentclassification process;addingsemanticsinthegeneratedfeaturevector.Thisapproachisbasedonuti...

متن کامل

A Vector Space Model for Subjectivity Classification in Urdu aided by Co-Training

The goal of this work is to produce a classifier that can distinguish subjective sentences from objective sentences for the Urdu language. The amount of labeled data required for training automatic classifiers can be highly imbalanced especially in the multilingual paradigm as generating annotations is an expensive task. In this work, we propose a cotraining approach for subjectivity analysis i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006